intertwingly

It’s just data

JSON Interop


Python’s simplejson, in an apparent attempt to avoid Unicode issues, defaults to encoding all non-ASCII characters using JSON’s \uXXXX syntax.  Ironically, this causes problems with, of all languages, JavaScript:

$ js
js> load('json.js')
js> print("\u263A".toJSONString());
":"
js> print(unescape(encodeURIComponent("\u263A".toJSONString())));
"☺"

The second, rather unobvious combination, converts Unicode to utf-8 and produces the correct result.  A workaround on the Python side would be:

$ python
>>> import simplejson
>>> simplejson.dumps("\u263A",ensure_ascii=False).encode('utf-8')
'"\xe2\x80\x99"'

Update: bug 397215 has been opened on the SpiderMonkey shell, and a compile time switch is already available to handle UTF-8 correctly. See the comments for details